222 PART 5 Looking for Relationships with Correlation and Regression

Summary statistics for the residuals

If you read about summarizing data in Chapter 9, you know that the distribution

of values from a numerical variable are reported using summary statistics, such as

mean, standard deviation, median, minimum, maximum, and quartiles. Summary

statistics for residuals are what you should expect to find in the residuals section

of your software’s output. Here’s what you see in Figure 16-4 at the top under

Residuals

» The minimum and maximum values: These are labeled as Min and Max,

respectively, and represent the two largest residuals, or the two points that lie

farthest away from the least-squares line in either direction. The minimum is

negative, indicating it is below the line, while the positive maximum is above

the line. The minimum is almost 21 mmHg below the line, while the maximum

lies about 17 mmHg above the line.»

» The first and third quartiles: These are labeled IQ and 3Q on the output.

Looking under IQ, which is the first quartile, you can tell that about 25 percent

of the data points (which would be 5 out of 20) lie more than 4.7 mmHg below

the fitted line. For the third quartile results, you see that another 25 percent

lie more than 6.5 mmHg above the fitted line. The remaining 50 percent of the

points lie within those two quartiles.»

» The median: Labeled Median on the output, a median of –3.4 tells you that half

of the residuals, which is 10 of the 20 data points, are less than –3.4, and half are

greater than –3.4. The negative sign means the median lies below the fitted line.

Note: The mean isn’t included in these summary statistics because the mean of

the residuals is always exactly 0 for any kind of regression that includes an

intercept term.

The residual standard error, often called the root-mean-square (RMS) error in regres-

sion output, is a measure of how tightly or loosely the points scatter above or

below the fitted line. You can think of it as the standard deviation (SD) of the resid-

uals, although it’s computed in a slightly different way from the usual SD of a set

of numbers. RMS uses N

2 instead of N

1 in the denominator of the SD

formula. At the bottom of Figure  16-4, Residual standard error is expressed as

9.838 mmHg. You can think of it as another summary statistic for residuals

Graphs of the residuals

Most regression programs will produce different graphs of the residuals if

requested in code. You can use these graphs to assess whether the data meet the

criteria for executing a least-squares straight-line regression. Figure 16-6 shows

two of the more common types of residual graphs. The one on the left is called a

residuals versus fitted graph, and the one on the right is called a normal Q-Q graph.